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I. Exome Sequencing at the University of Washington 


Quality control of sample DNA 

Quality control (QC) of the DNA samples included quantification with PicoGreen, confirmation 
of high-molecular weight DNA, tests for PCR amplification (four amplicons), and sex 
determination using a Taq-man assay’. Prior to preparation for exome sequencing, all samples 
were genotyped (Illumina BeadXpress) with 96 high frequency (30-50% minor allele frequency) 
exome specific SNPs, derived from the content found on genome wide arrays for both the 
Illumina and Affymetrix platforms. Genotype data at these variant sites were used to ensure 
sample tracking integrity through sample preparation and the sequencing pipeline. Samples 
failed QC if: (1) the total mass, concentration or integrity of DNA was low; (2) genotype call 
rates were low (<90%); or (3) sex-typing was inconsistent with the sample manifest. Following 
QC, all remaining genomic DNA (~ 3.5 ug) was reformatted into 96 well plates for library 


preparation and for exome capture. 


Library production and exome capture 

All protocols for library construction and exome capture were automated on a Perkin-Elmer 
Janus II liquid handling robot, and performed in 96-well plate format. Samples were prepared by 
subjecting genomic DNA (~3.5 ug) to a series of shotgun library construction steps, including 
fragmentation through acoustic fragmentation (Covaris), end-polishing and A tailing, ligation of 
sequencing adaptors, and PCR amplification. Sample shotgun libraries were captured for exome 
enrichment using one of three in-solution capture products: CCDS 2008 (~26Mb), the SeqCap 
EZ Human Exome Library v1.0 (~32 Mb), or the SeqCap EZ Human Exome Libraray v2.0 
(~34Mb). Briefly, 1 ug of shotgun library was hybridized to biotinylated capture probes for 72 


hours and recovered via streptavidin beads. Unbound DNA was washed away, and the captured 
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DNA PCR amplified. Following capture, washing, and PCR, libraries were assessed again on the 
Agilent Bioanalyzer for concentration, molecular weight distribution, and the presence of PCR 
artifacts. The fragment size distributions of the libraries were highly consistent (typically 125 + 


15 bp). 


Clustering and sequencing 

Library concentration and flow-cell loading cluster densities were determined using a 
standardized qPCR protocol (Kapa Biosystems). Using the automated Illumina cBot cluster 
station, non-multiplexed samples were processed in batches of eight (one for each lane of the 
flow-cell), diluted and denatured to their final effective loading concentrations. Hybridization 
was followed by cluster generation via bridge PCR as per standard protocols (Illumina). 


Enriched libraries were sequenced on an Illumina GAIIx paired-end 76 bp reads. 


Read mapping and variant analysis for QC purposes 

Samples were processed from real-time base-calls (RTA 1.7 software [Bustard], converted to 
qseq.txt files, and aligned to a human reference (hg19) using BWA (Burrows-Wheeler Aligner)’. 
Read-pairs not mapping within two standard deviations of the average library size (~125 + 15 
bp) were removed. Data were processed using the Genome Analysis ToolKit? (GATK 
refv1.2905). All aligned read data were subjected to “duplicate removal”, i.e. the removal of 
reads with duplicate start positions, indel realignment (GATK IndelRealigner) and base qualities 
recalibration (GATK TableRecalibration). Variant detection and genotyping were performed 
using the UnifiedGenotyper (UG) tool from GATK and on the targeted exome regions. Variant 
data for each sample were formatted (variant call format [VCF]) as “raw” calls for all samples, 
and sites flagged using the filtration walker (GATK) to mark sites that are of lower quality/false 


positives (i.e. low quality scores (<50), allelic imbalance (0.75), long homopolymer runs (>3), 
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and/or low quality by depth (QD<5)). Samples were considered complete when exome targeted 
read coverage was >8x over >90% of the exome target. Typically, the mean target coverage was 


60-80x. 


Data analysis QC at University of Washington 

Individual exome sequence data were evaluated against the following QC metrics which 
included an assessment of: (1) total reads, or a minimum of 30M PE reads; (2) library 
complexity: the ratio of unique reads to total reads mapped to target; (3) capture efficiency: the 
ratio of reads mapped to target versus the reads mapped to human; (4) coverage distribution: 
90% at >8x required for completion; (5) capture uniformity; (6) raw error rates; (7) Ts/Tv ratio 
(3.2 for known sites and 2.9 for novel sites); (8) distribution of known and novel variants relative 
to dbSNP; (9) fingerprint concordance with 96 QC SNPs >99%; (10) homozygosity; (11) 
heterozygosity. All QC metrics for both single-lane and merged data were reviewed to identify 
data deviations from known or historical norms. Lanes/samples that failed QC were re-queued 


for library prep for further sequencing. 


II. Exome Sequencing at the Broad Institute 


Receipt/QC of Sample DNA 

Samples were shipped to the Biological Samples Platform laboratory at the Broad Institute of 
MIT and Harvard. DNA concentration was determined by the Picogreen assay (Invitrogen, 
Carlsbad, California) before storage in 2D-barcoded 0.75 mL Matrix tubes at —20°C in the 
SmaRTStore (RTS, Manchester, UK) automated sample handling system. We performed initial 


QC on all samples involving sample quantification (PicoGreen), confirmation of high-molecular 
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weight DNA and fingerprint genotyping and gender determination (Illumina iSelect). Samples 
were failed if the total mass, concentration, integrity of DNA or quality of preliminary 


genotyping data was too low. 


Library construction and in-solution hybrid selection 

Starting with 3ug of genomic DNA, library construction and in-solution hybrid selection were 
performed as described by Fisher et alf. A subset of samples, however, was prepared using the 
Fisher et al. protocol with some slight modifications. Initial genomic DNA input into shearing 
was reduced from 3ug to 100ng in 50uL of solution. In addition, for adapter ligation, Illumina 
paired end adapters were replaced with palindromic forked adapters with unique 8 base index 


sequences embedded within the adapter. 


Preparation of libraries for cluster amplification and sequencing 

After in-solution hybrid selection, libraries were quantified using quantitative PCR (kit 
purchased from KAPA biosystems) with probes specific to the ends of the adapters. This assay 
was automated using Agilent’s Bravo liquid handling platform. Based on qPCR quantification, 
libraries were normalized to 2nM and then denatured using 0.1 N NaOH using Perkin-Elmer’s 
MultiProbe liquid handling platform. A subset of the samples prepared using forked, indexed 
adapters was quantified using qPCR, normalized to 2nM using Perkin-Elmer’s Mini-Janus liquid 
handling platform, and pooled by equal volume using the Agilent Bravo. Pools were then 
denatured using 0.1 N NaOH. Denatured samples were diluted into strip tubes using the Perkin- 


Elmer MultiProbe. 
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Cluster amplification and sequencing 

Cluster amplification of denatured templates was performed according to the manufacturer’s 
protocol (Illumina) using either Genome Analyzer v3, Genome Analyzer v4, or HiSeq 2000 v2 
cluster chemistry and flowcells. After cluster amplification, SYBR Green dye was added to all 
flowcell lanes, and a portion of each lane visualized using a light microscope, in order to confirm 
target cluster density. Flowcells were sequenced either on Genome Analyzer II using v3 and v4 
Sequencing-by-Synthesis Kits, then analyzed using RTA v1.7.48, or on HiSeq 2000 using HiSeq 
2000 v2 Sequencing-by-Synthesis Kits, then analyzed using RTA v1.10.15. All samples were 
run on 76 cycle, paired end runs. For samples prepared using forked, indexed adapters, 


Illumina’s Multiplexing Sequencing Primer Kit was also used. 


Read mapping and variant analysis 

Samples were processed from real-time base-calls (RTA 1.7 software [Bustard], converted to 
qseq.txt files, and aligned to a human reference (hg19) using BWA (Burrows-Wheeler Aligner) 
SS Aligned reads duplicating the start position of another read were flagged as duplicates and not 
analyzed (“duplicate removal”). Data were processed using the Genome Analysis ToolKit? 
(GATK v1.1.3). Reads were locally realigned (GATK IndelRealigner) and their base qualities 
were recalibrated (GATK TableRecalibration). Variant detection and genotyping were performed 
on both exomes and flanking 50bp of intronic sequence using the UnifiedGenotyper (UG) tool 
from the GATK. Variant data for each sample was formatted (variant call format [VCF]) as 
“raw” calls for all samples. SNP and Indel sites were flagged using the Variant Filtration walker 
(GATK) to mark sites of low quality that are likely false positives. SNPs were marked as 
potential errors if they exhibited strong strand bias (SB >= 0.10), low average quality (QD <5.0), 


or fell in a homopolymer run (HRun > 4). Indels were marked as potential errors for low quality 
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(QUAL < 30.0), low average quality (QD < 2.0), or if the site exhibited strong strand bias (SB > 
-1.0). Samples were considered complete when exome targeted read coverage was >20x over 
>80% of the exome target. 

Data Analysis QC 

Processed sequence data were required to match known fingerprint genotypes for their respective 
samples, and to achieve a sequence coverage of >20x for >70% of targeted bases. Variant calls 
were evaluated on both bulk and per-sample properties: novel and known variant counts, Ts/Tv 
ratio, Het/Hom ratio, and Deletion/Insertion ratio. Both bulk and sample metrics were compared 
to historical values for exome sequencing projects at the Broad. No significant deviation of the 


ESP calls or ESP samples from historical values were noted. 


II. Joint Calling of Variants for Entire ESP Project at the University of 


Michigan 


SNVs were called using the UMAKE pipeline at University of Michigan, which allowed all 
samples to be analyzed simultaneously, both for variant calling and filtering. Briefly, we used 
BAM files summarizing BWA alignments generated at the University of Washington and the 
Broad Institute as input. These BAM files summarized alignments generated by BWA, refined 
by duplicate removal, recalibration, and indel re-alignment. We excluded all reads that were not 
confidently mapped (Phred-scaled mapping quality < 20) from further analysis. To avoid PCR 
artifacts, we clipped overlapping ends in paired reads. We then computed genotype likelihoods 
for exome targeted regions and 50 flanking bases, accounting for per base alignment quality 


(BAQ) using samtools’. Variable sites and their allele frequencies were identified using a 


18 


maximum-likelihood model, implemented in glfMultiples®. These analyses assumed a uniform 


prior probability of polymorphism at each site. 


Variant and Sample Level Quality Control 


SVM Filter: We used a support vector machine (SVM) classifier to separate likely true positive 
and false-positive variant sites using a battery of SNP quality metrics. These include allelic 
balance (the proportional representation of each allele in likely heterozygotes), base quality 
distribution for sites supporting the reference and alternate alleles, and the distribution of 
supporting evidence between strands and sequencing cycle, amongst others. We used as the 
positive training set variants identified by dbSNP or 1000 Genomes and we used variants that 
failed multiple filters as the negative training set. We found this method to be effective at 
removing sequencing artifacts while preserving good-quality data, as indicated by the Ts/Tv ratio 
for previously known and newly identified variant sites, the proportion of high frequency 
variants overlapping with dbSNP, and the ratio of synonymous to non-synonymous variants, as 
well as attempts at validation of a subset of sites. A total of 1,908,614 SNVs passed the SVM 


filter. 


Filter based on Depth10: There were 52 pairs of duplicate samples in the final set of exomes 
from the Exome Sequencing Project - ESP6800 dataset. For each of these 52 pairs, we calculated 
the non-reference genotype concordance rates. The non-reference concordance (NRC) rate is a 
measure of concordance that only considers genotypes where at least one sample was called a 
heterozygote or a non-reference homozygote. Missing genotypes do not contribute to this 
calculation. Standard concordance rates for rare-variants tend to be dominated by an abundance 


of reference homozygous calls, thus we chose non-reference concordance rates as a measure of 
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genotyping specificity. To investigate whether to use a genotype filter based on read depth, we 
calculated NRC rates across a variety of read depth cutoffs (depth = 1, 5, 10, 15, 20, 25, 30). For 
each cutoff c, we replaced any genotype with an associated read depth less than c, with a missing 
value. As a measure of sensitivity, we calculated the total number of genotypes retained after 
enforcing the read depth cutoff. For the 52 pairs of duplicates, Figure 1 shows the NRC rate by 
the percent of total genotypes retained for a variety of read depth cutoffs. From this plot we 
concluded that a filter based on a read depth of 10 markedly improved concordance rates while 
maintaining over 90% of the total genotypes. Thus we replaced genotypes with a corresponding 
read depth less than 10 with a missing value in the gene-based analysis. This was not applied to 


the single-variant analysis as this analysis involved common variants. 
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Figure 1: Non-reference concordance rates are plotted on the y-axis versus % genotypes retained on the x-axis for 
each of the 52 duplicate samples in the ESP6800. All data-points are shown in black, the mean non-reference 


concordance for each read depth cutoff is shown in read. 


Filter based on mean Depth 500: We investigated further a variant level filter based on average 
per-SNP read depth. We used transition-transversion (ti-tv) ratios as a means of considering the 
overall quality of set of SNPs. In the exome, ti-tv ratios near a value of three are thought to be 
indicative of true positive SNPs. We noticed a general trend of increasing ti-tv ratios as the 
average per-variant depth increased, with a decrease in ti-tv ratios at very high average depths 
(Figure 2). This is most likely do to pseudo-SNPs. This happens when regions of the genome 
with close sequence homology (e.g., only | base-pair differentiates the two sequences) are 
subjected to short-read shotgun sequencing and the alignment software preferentially maps the 
reads from both regions to only one location. This results in a pile-up of reads at the preferential 
location, that appear to be polymorphic and an incorrect heterozygous call is made. To guard 
against these pseudo-SNPs we filtered out all variants with an average depth greater than 500. 
The low ti-tv ratios at very low depths were accounted for by enforcing the read depth 10 filter 


described above. 
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Figure 2: Transition to transversion ratios against average variant-level read depth bins. 


Principal Component Analysis and Ancestry Designation: After performing the SVM and 
read depth 10 filter on the ESP6800 call-set, we ran a principal component analysis (PCA) to 
determine sample-level outliers and to cross-check our self-reported ancestry. To do so we only 
included SNPs with a minor allele frequency (MAF) greater than or equal to 0.1% and a call-rate 
of greater than 95%. Only autosomal SNPs were included in the PCA. We ran the PCA in 
PLINK’ after pruning out SNPs in linkage disequilibrium (LD). This was done by looking in 
windows of 50 SNPs and shifting the windows 5 SNPs at each step. If a pair of SNPs had a 
genotype R? value greater than 0.5 one of the SNPs was removed. The resulting SNPs were used 


to determine a matrix of genome-wide Identity by State (IBS) pairwise distances which were 
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subsequently input to the PLINK multidimensional scaling (MDS) algorithm. Figure 2 shows the 
first two dimensions from the MDS (analogous to the first two principal components). The first 
two PCs clearly separate the African American (AA) samples from the European American (EA) 
samples. However, there is a clear group of admixed individuals between these two clusters 
where many self-reported Hispanic individuals were clustered. We removed from all subsequent 
analyses those individuals of indeterminate genetic ancestry located between the two vertical 
lines in Figure 3. For simplicity, we also removed from analysis any individual self-reporting 
race different from AA or EA. Of the remaining samples, all points to the left of the left-most 
vertical bar were designated as having AA genetic ancestry. All points to the right of the right- 
most vertical bar were designated as having EA genetic ancestry. Those samples with discrepant 


self-reported and designated ancestry were removed from all subsequent analyses. 
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Figure 3: The first two principal components from the ESP6800 call-set. Self-reported EAs are shown in orange, 
AAs in light-blue, Hispanics in red, Asians in black, and Native Americans in dark-blue. Missing self-reported race 
is shown in green. 


Analysis of Relatedness: After designating samples to AA and EA ancestry groups, we ran a 
race stratified kinship analysis to identify any cryptically related individuals in the ESP6800 call- 
set. To do so we only considered variants that passed the SVM filter, the Depth 500 filter, and 
after replacing genotypes with a corresponding read depth less than 10 with a missing value. 
Furthermore, we only considered variants that were in the intersection of the four capture targets 
that were used. The degree of relatedness was estimated using the KING software®. As with the 


MDS analysis, only LD-pruned autosomal variants with MAF > 0.001 were used as input. Pairs 
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of samples with kinship coefficient range of > 0.354, [0.177, 0.354], [0.0884, 0.177), [0.0442, 
0.0884) were designated as duplicates, 1“-degree, 2"degree, and 3"_degree relatives, 
respectively. Figure 4 displays the estimated kinship coefficients plotted against the proportion 


of SNPS with zero identical by state. 
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Figure 4: Estimated kinship coefficients plotted against the proportion of SNPs with zero identical by state. 
Duplicates are shown in black, 1‘-degree relatives in yellow, 2"'-degree relatives in blue, and 3"-degree relatives in 
red. 


Hardy-Weinberg Variant Level Filter: After running the kinship analysis, we considered 
whether variants were in Hardy-Weinberg equilibrium (HWE). This analysis was stratified by 
race, and only 1 individual from each duplicate/relative pair was included (the sample with the 
higher call-rate). Variants with a p-value testing HWE < 5x10” based on an exact test for 


HWE”, were excluded from further analyses. 
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Sample Level Missingness: After enforcing the read depth 10 cutoff, we calculated sample level 
missing rates (Figure 5). There is a clear difference between the four target capture arrays that 
were used. Within each of the four targets, only one sample (in black) was a clear outlier. This 


sample was excluded from further analyses. 
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Figure 5: Sample level missing rates after enforcing a DP10 filter on the genotypes. Missing rates for the 4 capture 
targets are shown in black, red, blue and yellow 


Sample Level Homozygosity: For each sample we calculated inbreeding coefficients in PLINK. 
We used the same set of variants that were included in the MDS analysis. One EA sample was 
found it have an exceedingly high inbreeding coefficient compared to the other samples (Figure 


6). This sample was removed from subsequent analyses. 
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Figure 6: Sample level homozygosity estimates stratified by the three race groups (AA, EA, Indeterminate). 


Sex Check: To guard against potential sample swaps, we cross-checked self-reported sex against 
a normalized measure of read depth on the X and Y chromosomes. Because of the way the 
samples were processed, we normalized the read depth for the first 2,484 samples differently 
from the last 4,339 samples. Figure 7 shows the normalized coverage on the two sex 
chromosomes. There are very clearly two distinct clusters (males and females) in each plot. 
Samples where the self-reported sex was clearly different from the XY coverage cluster 


(highlighted in Figure 7) were considered sample swaps and excluded from further analysis. 
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Figure 7: Plots of normalized chrY depth of coverage versus normalized chrX depth of coverage. The first 2,484 
samples are shown in the left panel, the second 4,339 samples are shown in the right panel. Samples self-reported as 
male but falling in the female cluster are displayed in red, self-reported females that fall in the male cluster are 
displayed in yellow. 


GWAS Concordance: When we had access to genome-wide SNP array data we ran 
concordance checks between the ESP variants that overlapped with the variants typed on the 
arrays. Samples identified as having very low concordance rates were subsequently dropped 


from further analysis due to the strong likelihood that they were sample swaps. 


Variant Level Missingness: We did not enforce a call-rate filter for the per-variant analyses. For 
the gene-level analyses, for each gene we first removed samples with >10% missing rate for the 
variants in that gene. Once these samples were removed we filtered out variants with missing 


rate > 10%. 


Variant Annotation: All variants in the ESP6800 were submitted to the SeattleSeq annotation 


server (http://snp.gs.washington.edu/SeattleSeqAnnotation134/) on May 29, 2012. We used 


annotation version 134, the hg19 build of the human reference genome, and the NCBI full genes 
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(NM, XM) gene model option. For variants mapping to multiple transcripts, we retained the most 
damaging classification (from most damaging to least: nonsense, splice, missense, synonymous, 


utr, other). 


IV. Exome Array Genotyping 


Study samples were processed on the HumanExome BeadChip v1.0 ( Illumina, Inc., San Diego, 
CA) using standard protocols suggested by the manufacturer at local genotyping centers. 
Genotypes were assigned using GenomeStudio v2010.3 using the calling algorithm/genotyping 
module version 1.8.4 along with the custom cluster file StanCtrExChp_CEPH.egt. At most 
genotyping centers, these calls were supplemented by the application of the zCall rare variant 
calling algorithm.'° Across ~66,000 samples from the CHARGE Consortium, the raw data files 
for the samples were assembled into a single project for joint calling. Genotype data for the four 
APOC3 mutations (exm957809, exm957810, exm957815, and exm957817) were extracted prior 


to analysis. 


V. Study Participants 


Discovery study samples: The U.S. National Heart, Lung, and Blood Institute’s Exome 
Sequencing Project (ESP) sought to use exome sequencing as a tool to discover novel genes and 
mechanisms contributing to heart, lung, and blood disorders 


1L12 Participants for the present analysis were 3,734 


(https://esp.gs.washington.edu/drupal/). 
individuals who had both exome sequence and plasma triglycerides available (Table S1). 


Participants were enrollees in seven population-based cohorts [Atherosclerosis Risk in 


Communities (LARICH Coronary Artery Risk Development in Young Adults (CARDIA),"* 
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Cardiovascular Health Study (CHS), Framingham Heart Study (FHS),'° Jackson Heart Study 
(JHS),'’ Multi-Ethnic Study of Atherosclerosis (MESA),'® and the Women’s Health Initiative 
(WHD ”] and a study of early-onset myocardial infarction, Myocardial Infarction Genetics 


Consortium, MIGen).”” 


Replication Study Samples: We genotyped 41,671 African-Americans (AA) or participants of 
European ancestry (EA) from seven replication studies: ARIC (EA and AA), JHS (AA), WHI 
(EA and AA), Malmo Diet and Cancer Study Cardiovascular Cohort (MDC-CVA, EA),”! Ottawa 
Heart Study (EA), Precocious Coronary Artery Disease (PROCARDIS) study (EA),” and Italian 
Atherosclerosis, Thrombosis, and Vascular Biology (ATVB) study (EA).” These participants 


were independent from those sequenced in the discovery study. 


VI. APOC3 Genotypes: Replication For Plasma Lipids 


To follow-up the strongest result for triglycerides observed in the discovery sample, i.e., APOC3, 
we performed genotyping of four mutations (R19X, IVS2+1 G>A, A43T, and IVS3+1 G>T) 
using the Illumina HumanExome Beadchip. Three of the four mutations are predicted to 
severely disrupt APOC3 function, i.e., lead to loss of function (LoF).™ LoF variants included a 
nonsense substitution (De. RI9X) and two DNA sequence variants disrupting a splice site Oe, 
IVS2+1 G>A and IVS3+1 G>T). Each of these APOCS3 variants and a fourth, missense variant 
A43T, were associated with lower plasma triglycerides, suggesting that all four variants lead to 
loss of APOC3 function. 

We genotyped 41,671 African-Americans (AA) or participants of European ancestry 
(EA) from seven replication studies (Table S2). These participants were independent from those 


sequenced in the discovery study. We performed race-specific linear regression with the outcome 
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variable of plasma triglycerides (or other lipid fractions), independent variable of variant allele 
carrier status (coded as 0,1,2), and covariates of age, gender, and at least two principal 
components of ancestry. We also considered a model where carriers of any of the four LoF 
mutations were collapsed into a single independent variable - APOC3 LoF carrier. Statistical 
evidence across the studies was summarized through meta-analysis with inverse of the variance 


as weights. 


VII. APOC3 Genotypes: Association With CHD 


We next tested the association of APOC3 LoF carrier status with CHD in EA, AA, and Hispanic 
ancestry (HA) participants from 15 studies. Participants were genotyped using the [lumina 
HumanExome Beadchip. Descriptions of the studies and the definitions for CHD outcomes are 
provided in Table S3. We calculated P values for the association tests and the confidence 
intervals for the odds ratios by using exact methods. We performed meta-analyses by using 
Cochran-Mantel-Haenszel statistics for stratified 2X2 tables. The Cochran-Mantel-Hanszel 
method combines score statistics rather than Wald statistics and is particularly attractive when 
the observed odds ratios are zero. All the results were obtained from the Freq procedure in SAS. 
As an alternate approach, we performed logistic regression where the outcome variable 
was either incident CHD or prevalent CHD, the independent variable was APOC3 LoF carrier 
status (coded as 0 or 1) and covariates of age, sex, and at least two principal components of 


ancestry; these analyses yielded similar results (data not shown). 
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VII. Plasma apoC-III Protein Concentration And Risk For Incident CHD 


Blood was drawn from fasting participants in the Framingham Heart Study Offspring cohort 
examination cycle 5 (1991 — 1995). Plasma apoC-III protein concentration was assessed in 
3,238 individuals using a commercially available immmunochemical assay from Wako 
Diagnostics (Richmond, USA). All participants underwent continuous surveillance for incident 
CHD events until December 31, 2010. CHD events included fatal MI, non-fatal MI, angina 
pectoris, and coronary insufficiency as described previously.”° We prospectively studied 2,913 
persons without prevalent CHD. Using proportional-hazards regression, we examined the 
relations of plasma apoC-III (natural logarithmically transformed) to risk of incident CHD. We 
tested two models: (1) age- and sex-adjusted; and (2) multivariable models adjusting for age, 
sex, smoking, diabetes mellitus, LDL cholesterol, HDL cholesterol, hypertension treatment, 
systolic and diastolic blood pressure, lipid-lowering treatment, and fasting serum glucose. 

In order to evaluate plasma apoC-III protein in the secondary prevention setting, we 
studied the association of plasma apoC-III protein with incident total and cardiovascular 
mortality in the Verona Heart Study. We recruited 794 subjects with angiographic coronary 
artery disease and measured plasma apoC-III as previously described.” During a median follow- 
up of 59 months, there were 134 deaths, with 92 due to cardiovascular disease (coronary artery 
disease, heart failure, peripheral artery disease, or cerebrovascular disease). Using proportional- 
hazards regression, we examined the relations of plasma apoC-III (natural logarithmically 
transformed) to risk of incident total or cardiovascular mortality. We tested two models: (1) 
age- and sex-adjusted; and (2) multivariable models adjusting for age, sex, smoking, diabetes 
mellitus, LDL cholesterol, HDL cholesterol, hypertension, lipid-lowering treatment, and fasting 


serum glucose. 
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IX. APOC3 Genotypes and Association With Hepatic Steatosis 

Between 2002 and 2005, 1,400 individuals from the Framingham Offspring Study and 2,011 
individuals from third generation underwent multi-dectector computed tomograms on which we 
evaluated liver attenuation as previously described.” We tested the association of APOC3 LoF 
genotypes with CT liver fat after inverse normal transformation. Covariates in the regression 


models included age, age’, gender, and number of alcoholic drinks per week. 
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Figure S1. Quantile-quantile plot of results testing the association of plasma triglycerides with 
single coding sequence variants (A) and with variants aggregated at the gene level (B). AA 
denotes African Americans; EA, European ancestry 


1A. 
Single-variant 

10 

8 
a e 
P 
a 
v 
: raa 
b e EA 
fe) ô Race Combined 


Expected (-logP) 


34 


1B. 


Gene-burden 


+ AA 
as EA 
& Race Combined 


Observed -log10(p-value) 
3 


Expected -log1 0(p-value) 


35 


Figure S2. Distribution of plasma apolipoprotein C-III concentrations in the Framingham Heart 
Study Offspring cohort (n=3,237) in mg/dl. 
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Figure S3. Mean plasma apolipoprotein C-III concentration in carriers of APOC3 variants 
[R19X (n=2) or splice site [VS2+1 G>A (n=11)] compared non-carriers in the Framingham 
Heart Study Offspring cohort. Mean plasma apoC-III concentration in carriers (n=13) was 8.96 
mg/dl (SD 1.69) whereas the mean concentration in non-carriers (n=2,694) was 16.63 mg/dl (SD 
4.71) (P=8 x IO, 


20 


n=2,694 


15 


10 


gi n=13 


Plasma apoC-lll 


Carrier Non-carrier 


APOC3 


37 


Table S1. Baseline characteristics of 3,734 participants sequenced across exome and with plasma triglycerides 


Study ARIC CARDIA CHS MIGen FHS JHS MESA WHI 

N 798 195 204 11 416 311 394 1405 

Mean age + SD, yrs 53.8 + 5.8 26.5 + 2.9 73.5 + 5.8 52.4 + 9.0 39.7 +49.9 54.0 + 12.2 61.4+9.7 63.3 47.6 

Gender, % female 53.1% 43.1% 25.5% 81.8% 34.1% 55.3% 37.8% 100% 
(n=424) (n=84) (n=52) (n=9) (n=142) (n=172) (n=149) (n=1405) 

African-American Ancestry, % 37% 44% 31% 100% 0% 100% 38% 52% 
(n=295) (n=86) (n=64) (n=11) (n=0) (n=311) (n=151) (n=734) 

Total cholesterol, mg/dl 226.0 + 69.0 185.6+49.1 207.0 + 46.9 184.5 + 52.2 197.3 + 43.8 206.0 + 53.4 193.5 + 51.3 230.3 + 46.9 

Low-density lipoprotein cholesterol, 148.0 + 68.3 116.8 +45.6 128.7 +46.5 112.9 +47.6 125.1 +40.0 132.8 + 51.4 118.4 + 48.4 147.1 +45.0 

mg/dl 

High-density lipoprotein cholesterol, 48.4 + 17.3 52.0 + 13.1 47.8 +14.5 43.8 + 4.3 47.3 +13.7 49.3 +15.7 50.2 + 15.0 54.1 +15.3 

mg/dl 

Triglycerides 150.5 + 98.7 84.5 +71.3 156.4+81.0 135.7 +62.7 129.8 + 110.8 122.9 + 83.1 125.8 + 73.8 142.4 + 84.2 


For lipid traits, data shown are mean + standard deviation; ARIC denotes Atherosclerosis Risk in Communities Study''; CARDIA, Coronary Artery Risk Development in Young Adults’*; CHS, 
Cardiovascular Health Study”; MIGen, Myocardial Infarction Genetics Consortium"; FHS, Framingham Heart Study’ ; JHS, Jackson Heart Study"; MESA, Multi-Ethnic Study of Atherosclerosis!’; 
WHI, Women’s Health Initiative’? 
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Table S2. Characteristics of participants in replication study of APOC3 coding sequence variants with plasma lipid levels 


Cohort 


Ancestry 


N 
Mean age + SD, yrs 


Gender, % female 


Total cholesterol, mg/dl 


Low-density lipoprotein 
cholesterol, mg/dl 


High-density lipoprotein 
cholesterol, mg/dl 


Triglycerides 


ARIC 


EA 


10,349 


54.4 + 
ST 


53% 


214.5 + 
38.6 


137.2 + 
35.5 


50.7 + 
16.7 


136.3 + 
90.6 


FHS 


EA 


7,033 


37.74 
9.6 


53% 


192.3 + 
37.8 


118.9 + 
34.3 


53.0 + 
15.5 


102.7 + 
82.4 


MDC- 
CVA 
EA 


4,924 


57.6 + 
5.9 


59% 


239.5 + 
43.8 


162.4 + 
40.2 


53.3 + 
14.4 


121.1 
69.9 


WHI 


EA 


4,157 


66.9 + 
6.6 


100% 


239.0 + 
44.1 


155.4 + 
39.7 


51.6 + 
13.5 


161.1 + 
91.2 


OHS 


Cases 


EA 


800 


54.1 + 
9.3 


17% 


238.6 + 
48.0 


155.9 + 
39.6 


42.74 
13.9 


234.3 + 
203.1 


OHS 


Controls 


EA 


2,111 


74.7 + 
6.0 


49% 


219.6 + 
40.2 


137.9 + 
34.2 


56.7 + 
16.5 


126.7 + 
85.7 


Procardis Procardis ATVB 
Cases Controls Cases 
EA EA EA 
1,070 1,776 1,252 
58.4 + 67.0 + 39.74 
7.6 4.8 4.9 
40% 49% 12% 
234.24 | 21954 | 221.3 + 
48.9 38.8 56.0 
146.74 | 13254 | 147.7 + 
44.4 32.7 52.3 
47.8 + KE EE 42.0 + 
13.3 15.1 13.0 
190.3+ | 143.04 | 177.7 + 
120.5 81.1 132.4 


ATVB 


Controls 


EA 


960 


39.3 + 
5.1 


14% 


201.5 + 
37.2 


125.7 + 
34.9 


49.1 
12.6 


HIE 
70.7 


ARIC 


AA 


2,933 


53.74 
5.8 


62% 


213.5 + 
42.2 


136.6 + 
39.5 


55.0 + 
17.4 


113.1 + 
84.3 


JHS 


AA 


2,154 


52.9 + 
12.7 


63% 


205.6 + 
42.7 


133.8 + 
39.0 


SETE 
14.6 


103.2 + 
78.0 


WHI 


AA 


2,152 


67.14 
5.2 


100% 


233.6 + 
45.8 


153.9 + 
42.7 


57.0 + 
14.5 


113.2 
69.6 


For lipid traits, data shown are mean + standard deviation; ARIC denotes Atherosclerosis Risk in Communities Study; FHS, Framingham Heart Study; MDC-CVA, Malmo Diet and Cancer 


Study Cardiovascular Arm; WHI, Women’s Health Initiative; OHS, Ottawa Heart Study; PROCARDIS, Precocious Coronary Artery Disease Study; ATVB, Italian Atherosclerosis, 
Thrombosis, and Vascular Biology Study; JHS, Jackson Heart Study 
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Table S3. Definitions of coronary heart disease across fifteen studies 


Study 


WHI 


FHS 


MDC-CVA 


ARIC 


IPM 


ATVB 


VHS 


Ottawa 


PROCARDIS 


HUNT 


Design 


Prospective, cohort 


Prospective, cohort 


Prospective, cohort 


Prospective, cohort 


Case-control 


Case-control 


Case-control 


Case-control 


Case-control 


Case-control 


Definition of CHD 


WHI participants included in this study were 50-79 years of age at 
enrollment in 1993-1998. These women were followed for development of 
clinical CHD until 2012. A CHD event was defined as a definite or 
probable myocardial infarction, silent myocardial infarction, coronary 
revascularization, hospitalized angina, or death due to CHD. 


Incident nonfatal or fatal MI, angina pectoris, and coronary insufficiency 
Incident nonfatal or fatal MI 


Incident definite or probable MI, silent MI (indicated by 
electrocardiogram) between 4 examinations in 1987—1998, definite CHD 
death, or coronary revascularization 


CAD cases were ascertained from Institute for Personalized Medicine 
Biobank; CAD was defined using the electronic health record. Cases were 
documented ICD9 codes 410.xx to 414.xx and (abnormal stress test or 
abnormal coronary angiography) 


MI in men or women < 45 yo 


Documented diagnosis of MI, coronary artery bypass grafting (CABG), 
CAD (by angiography) in males < 50 yo for males and in females < 60 yo 


Angiography (>1 coronary vessel with >50% stenosis); < 50 yo for males 
and < 60 yo for females; without type 2 diabetes 


Symptomatic CAD before age 66 years and 80% of cases also had a 
sibling in whom CAD had been diagnosed before age 66 years. CAD was 
defined as clinically documented evidence of myocardial infarction (80%), 
coronary artery bypass graft (10%), acute coronary syndrome (6%), 
coronary angioplasty (1%) or stable angina (hospitalization for angina or 
documented obstructive coronary disease) (3%) 


MI cases collected by the Norwegian Nord-Trøndelag health study 
(HUNT) Biobank 


The GoDARTS (Genetics of Diabetes Audit and Research in Tayside 
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Ascertainment of controls Refs 


Participants free of CHD on follow-up 


Participants free of CHD on follow-up 


Participants free of CHD on follow-up 20 


Participants free of CHD on follow-up 


NIH dbGaP Study Accession: 
phs000388.v1.p1 


Controls were individuals in biobank who did 
not meet case criteria 


No history of thromboembolic disease a 


Coronary angiography normal 14 
Asymptomatic, males >65, females >70 SE 
No personal or sibling history of CAD before 23 
age 66 years. 
Free of MI on Norwegian ischemic heart 2 
disease national register 
25 


Controls were free of coronary artery disease, 


Scotland) study is a joint initiative of the Department of Medicine and the 

Medicines Monitoring Unit (MEMO) at the University of Dundee, the 
GoDARTS CAD Case-cohort diabetes units at three Tayside healthcare trusts (Ninewells Hospital and 
Medical School, Dundee; Perth Royal Infirmary; and Stracathro Hospital, 

Brechin), and a large group of Tayside general practitioners with an 
interest in diabetes care. Cases were a first-ever CAD event, defined as 
fatal and non-fatal myocardial infarction, unstable angina or coronary 
revascularisation 


The EPIC (European Prospective Study into Cancer and Nutrition) study 
sub-cohorts from the UK were used, subjects were collected in 
collaboration with general practicioners, mainly in Cambridgeshire and 
Norfolk. Cases were individuals who developed a fatal or non-fatal CAD 
during an average follow-up of 11 years, until June 2006. Participants were 
EPIC CAD Nested case-cohort identified if they had a hospital admission and/or died with CAD as the 
underlying cause. CAD was defined as cause of death codes ICD9 410-414 
or ICD10 120-125, and hospital discharge codes ICD10 120.0, 121, 122 or 
123 according to the International Classification of Diseases, 9th and 10th 
revisions. 


Cases of MI occurring in participants from Västerbotten Intervention 
Program (VIP), WHO’s Multinational Monitoring of Trends and 
nr Nested case control Determinants in Cardiovascular Disease (MONICA) study in northern 
Sweden and the Mammography Screening Project (MSP) in Västerbotten. 


KORA-MI: Hospitalized survivors of MI who are 26-74 years of age. 
The diagnosis of a MI (<60) was made with the use of the algorithm of the 
MONICA project. PopGen CAD: the PopGen CAD sample comprised 
unrelated German CAD patients with early onset of disease who were 
recruited in Schleswig—Holstein, Germany (www.PopGen.de). 
Angio-Liib: the Liibeck angiographic study (Angio-Liib) includes patients 
with angiographically proven CAD who underwent cardiac catheterization 
at the University Hospital Schleswig-Holstein, Campus Liibeck between 
2005 and 2008. Patients were not selected for particular risk factors or 
phenotypes. Munich-MI: Participants of the Munich MI sample included 

German CAD Case-control in this study were consecutively recruited from 1993 to 2002 and 
examined with coronary angiography at Deutsches Herzzentrum Miinchen 
and 1. Medizinische Klinik rechts der Isar der Technischen Universitit 
Miinchen. The diagnosis of MI was established in the presence of chest 
pain lasting >20 minutes combined with ST-segment elevation or 
pathological Q waves on a surface electrocardiogram. Patients with MI had 
to show either an angiographically occluded infarct-related artery or 
regional wall motion abnormalities corresponding to the 
electrocardiographic infarct localization, or both. 
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stroke and peripheral vascular disease 


Controls were study participants who remained 
free of any cardiovascular disease during 
follow-up (defined as ICD9 401-448 and ICD10 
110-179). 


Individuals free of MI from VIP and MSP 26.27 


Controls were subjects from population-based 
studies from Germany (PopGen, Heinz- 
Nixdorf-Recall, KORA). 


28-32 


CAD cases in the WTCCC Study were from those recruited in the British 
WTCCC Case-control Heart Foundation Heart Family Heart Study (BHF-FHS) and 
supplemented by additional cases from WTCCC-CAD2 


Controls were subjects from the UK 1958 Birth 33,34 
Cohort. 


WHI, Women’s Health Initiative; FHS, Framingham Heart Study; MDC-CVA, Malmo Diet and Cancer Study-Cardiovascular Arm; ARIC, Atherosclerosis Risk in Communities Study; IPM, Mt. Sinai 
Institute for Personalized Medicine Biobank; ATVB, Italian Atherosclerosis, Thrombosis, and Vascular Biology Study; Verona, Verona Heart Study; Ottawa, Ottawa Heart Study; PROCARDIS, 
Precocious Coronary Artery Disease Study; HUNT, Nord-Trøndelag health study; GODARTS, Genetics of Diabetes Audit and Research Tayside; FLA3, First Myocardial Infarction in AC county 3; 
EPIC, European Prospective Study into Cancer and Nutrition, WTCCC, Wellcome Trust Case Control Consortium 

MI denotes myocardial infarction; CAD, coronary artery disease 


42 


Table S4. Association of individual gene variants and plasma triglycerides in African 
Americans 


"Vi Gee H Bela Statiste P ` Vive annotation 
chr11_116662407 APOAS 1562 0.165 4.962 7.74E-07 0.07 S19W 
chr6_153019197 MYCTI 1564 0.591 4.948 8.33E-07 0.005 TS4A 
chr16_5140548 FAM86A 1561 1.027 4.443 9.51E-06 0.001 T121A 
chr12_106632875 CKAP4 1564 1.015 4.388 1.22E-05 0.001 G579D 
chr2_113671410 IL37 1564 -0.078 -4.380 1.27E-05 0.33 T42A 
chr1_183514098 SMG7 1564 0.824 4.373 1.31E-05 0.002 P632H 
chr16_702524 WDR90 1476 0.146 4.357 1.41E-05 0.07 G371S 
chr2_197298051 HECW2 1564 0.759 4.338 1.53E-05 0.002 A33T 
chr22_37465121 TMPRSS6 1467 0.326 4.281 1.99E-05 0.02 R711L 
chr1_240071937 CHRM3 1564 0.534 4.270 2.07E-05 0.005 L396M 
chr19_53014422 ZNF578 1559 -0.130 -4.221 2.58E-05 0.08 1263T 
chr19_54652192 CNOT3 1456 0.795 4.211 2.71E-05 0.002 G402S 
chr15_89870432 POLG 1561 0.969 4.196 2.88E-05 0.001 A467T 
chr6_90408618 MDN1 1564 0.945 4.094 4.46E-05 0.001 E3045G 
chr22_29446079 ZNRF3 1353 1.93 4.088 4.61E-05 0.001 H637R 


Covariates included age, age”, sex, two principal components of ancestry, an indicator variable for race (in race-combined model only) and 
indicator variables for sequencing ascertainment scheme. 
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Table S5. Association of individual gene variants and plasma triglycerides in participants of 
European ancestry 


Chromosome, Minor allele Protein 
position Gene N Beta Statistic P frequency annotation 
chr4_ 4304605 ZBTB49 2079 0.786 5.085 4.00E-07 0.003 A348T 
chr6_39832264 DAAM2 2079 1.054 4.377 1.27E-05 0.001 R105H 
chr3_ 62307648 C3orfl4 2074 0.744 4.173 3.13E-05 0.002 L33M 
chr11_116701560 APOC3 2075 -0.992 -4.120 3.94E-05 0.001 A43T 
chr4_4322570 ZBTB49 2079 0.778 4.103 4.24E-05 0.002 E609K 
chr8_ 121357700 COLI4A1 2079 -0.949 -3.970 7.45E-05 0.001 P1659A 
ch 32633036 TAFIL 2079 -0.931 -3.889 0.0001 0.001 D848N 
chr20_55941872 RAEI 2079 0.628 3.885 0.0001 0.003 P129S 
chr3_49314251 C3orf62 2079 0.925 3.870 0.0001 0.001 R19G 
chr4_ 84384688 FAMI75A 2079 -0.725 -3.831 0.0001 0.002 R252Q 
chr2_190608005 ANKAR 2079 -0.775 -3.823 0.0001 0.001 R1272H 
chr12_10532326 See 0T -0.077 -3.811 0.0001 0.21 (GE 
chr12_57863433 GLU 2079 -0.482 -3.797 0.0002 0.004 R382W 
chr20_58476811 SYCP2 2066 -0.222 -3.780 0.0002 0.02 S363N 
chr9_116132334 BSPRY 2079 0.0793 3.772 0.0002 0.19 T3741 


Covariates included age, age”, sex, two principal components of ancestry, an indicator variable for race (in race-combined model only) and 
indicator variables for sequencing ascertainment scheme. 
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Table S6. Association of individual gene variants and plasma triglycerides in participants of African 
American and European ancestry 
Minor allele Minor allele 


frequency frequency 
Chromosome, African European Protein 
position Gene N Beta Statistic P Americans Americans annotation 
chr11_116662407 APOAS 3728 0.124 5.092 3.71E-07 0.07 0.06 S19W 
chr2_27730940 GCKR 3734 0.0686 4.984 6.50E-07 0.1 0.4 L446P 
chr6_153019197 MYCTI 3731 0.586 4.470 8.05E-06 0.005 0 TS4A 
chr20_55941872 RAE1 3734 0.563 4.300 1.75E-05 0.001 0.003 P1298 
chr3_62307648 C3orfl4 3728 0.648 4.246 2.23E-05 0.0006 0.002 L33M 
chr8_ 19819724 LPL 3734 -0.0891 -4.244 2.25E-05 0.07 0.1 S474X 
chr17_38031648 ZPBP2 3734 -0.457 -4.202 2.71E-05 0.0003 0.005 K262E 
chr12_57863433 GLIL 3734 -0.476 -4.081 4.57E-05 0.0003 0.004 R382W 
chr5_ 102423628 GIN1 3416 -0.371 -4.051 5.22E-05 0.01 0.0002 N515D 
chr4_84384688 FAM175A 3734 -0.724 -4.045 5.34E-05 0 0.002 R252Q 
chr8_121292281 COL14A1 3734 0.702 3.918 9.10E-05 0.0006 0.001 A1197T 
chr18_65181506 DSEL 3734 -0.698 -3.892 0.0001 0.0003 0.002 A124T 
chr2_29259543 FAM179A 3734 -0.183 -3.888 0.0001 0.005 0.02 V852A 
chr1_41978890 HIVEP3 3732 0.288 3.881 0.0001 0.01 0 R2001Q 
chr11_19955322 NAV2 3724 -0.254 -3.873 0.0001 0.003 0.01 T447M 


Covariates included age, age”, sex, two principal components of ancestry, an indicator variable for race (in race-combined model only) and 
indicator variables for sequencing ascertainment scheme. 
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Table S7. Gene-based association results aggregating coding sequence variants with minor allele 
frequency < 1% 


Ancestry Gene Full gene name Location P 
EA APOC3 apolipoprotein CHL 11q23.3 6.89E-06 
EA C12orf56 chromosome 12 open reading frame 56 12q14.2 8.63E-05 
AA GIN1 gypsy retrotransposon integrase 1 5q21.1 9.16E-05 
AA MARCH6 membrane-associated ring finger (C3HC4) 6, E3 ubiquitin 5p15.2 3.80E-05 

protein ligase 

Combined APOC3 apolipoprotein C-III 11q23.3 1.31E-05 

popop. q 


Presented here are genes with P < 0.0001 for triglycerides in European ancestry, African American ancestry, and overall. 
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Table S8. Combined allele frequency of four rare APOC3 loss-of-function mutations 


Ancestry APOC3 R19X IVS2+1 G>A IVS3+1 G>T A43T Total Combined Combined 
1s76353203 rs138326449 rs140621530 rs147210663 Allele Carrier 
alternate allele alternate allele alternate allele alternate alternate frequency frequency 
count/total count/total count/total allele allele 
number of number of number of count/total count/total 
chromosomes chromosomes chromosomes number of number of 


chromosomes chromosomes 


EA 3/8588 16/8586 1/8590 8/8592 28/8592 0.00326 1:154 
(1:307) 

AA 0/4402 3/4401 5/4400 7/4402 15/4402 0.00341 1:147 
(1:293) 
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Table S9. Association of four APOC3 coding sequence variants and plasma lipid levels 


Carriers of Carriersof Carriers of 
any of four any of four any of four 


Mutation R19X R19X Kee Ve A43T A43T TYS3+1 yer APOC3 APOC3 APOC3 
GA @A @T @T S s ; 
mutations mutations mutations 
Race- 
Ancestry EA AA EA AA EA AA EA AA EA AA : 
combined 
N 33,068 5,066 14,623 2,152 24,840 7,282 10.618 7,279 34,432 7,239 41,671 
TG* -0.58 -0.38 -0.53 -0.76 -0.49 -0.13 -0.43 -0.63 -0.55 -0.38 -0.49 
Beta (SE) (0.09) (0.21) (0.07) (0.14) (0.13) (0.08) (0.37) (0.16) (0.05) (0.06) (0.04) 
PTG 1.6e-11 0.07 7.2e-16 8.1e-08 2.0e-04 0.15 0.24 5.4e-05 <1.0e-20 1.4e-09 <1.0e-20 
LDL-C* -15.5 3.1 -8.6 -5.9 -4.1 13.4 13.1 2.4 -9.3 10.7 -3.8 
Beta (SE) (6.4) (19.2) (5.4) (13.6) (9.3) (7.4) (26.5) (14.2) (3.4) (5.4) (2.9) 
P LDL-C 0.02 0.87 0.11 0.67 0.66 0.07 0.62 0.86 5.6e-03 0.05 0.19 
HDL-C* 17.1 -1.2 9.0 7.4 4.5 6.3 13.5 25.5 11.5 9.1 10.8 
Beta (SE) (2.5) (1.4) (1.8) (4.7) (3.7) (2.7) (10.6) (5.3) (1.3) (2.0) (1.1) 
P HDL-C 3.5e-12 0.87 6.7e-07 0.12 0.21 0.02 0.20 1.2e-06 <1.0e-20 7.5e-06 <1.0e-20 


TG denotes triglycerides; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol 

P values are for the comparison with noncarriers. P values were derived from a linear regression model, with adjustments for age, sex, ancestry, and principal components of ancestry. The P value for 
the triglyceride phenotype is based on triglyceride levels logarithmically transformed on a natural log scale. 

*Units for TG are In (triglycerides), for LDL-C is mg/dl, for HDL-C is mg/dl 
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Table S10: Association of APOC3 LoF carrier status with 
plasma lipids before and after conditioning on APOAS S19W 


Whites 
Before accounting After accounting 
for APOAS S19W for APOAS S19W 
Outcome variable Beta (SE) Beta (SE) 
P P 
Triglycerides -0.56 (0.12) -0.56 (0.12) 
4x10% 4x10 
HDL cholesterol +12.0 (3.6) +11.9 (3.6) 
0.001 0.001 
LDL cholesterol -12.4 (8.5) -12.4 (8.5) 
0.15 0.15 
n 10,349 10,349 
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Blacks 
Before accounting After 
for APOAS S19W accounting for 
APOAS S19W 
Beta (SE) Beta (SE) 
P P 
-0.39 (0.10) -0.40 (0.10) 
0.0001 9x 10° 
+12.5 (3.6) +12.6 (3.6) 
0.0006 0.0005 
+11.4 (8.4) +11.2 (8.4) 
0.18 0.18 
2,932 2,932 


Table S11. Association of rare APOC3 mutations and risk for coronary heart disease 


Non-carriers Carriers of APOC3 R19X, Proportion of cases Proportion of controls 
IVS2+1 G>A, IVS3+1 G>T, or who carry variant who carry variant 
A43T 
Study 1 - WHI 
EA Controls (ape T 0.25% 0.63% 
AA Coil Ko e 0% 0.62% 
Study 2 - FHS 
a Se Se ; 0% 0.20% 
Study 3 - MDC-CVA 
Ss Gees? GE e 0.59% 0.35% 
Study 4 - ARIC 
Ss GE SE e 0.11% 0.19% 
EN Ee Se e Ge 1.4% 0.86% 
Study 5 - IPM 
an ee Se 1.4% 1.91% 
an cask GC > 0.19% 0.37% 
ee SC Ge e 0.54% 0.86% 
Study 6 & 7 - ATVB 
+ VHS 
Em Ge E FA 0.56% 1.3% 
Study 8 - Ottawa 
on GE ee b 0.29% 0.84% 
Study 9 - 
PROCARDIS 
EA Controls 2163 7 van one 
Study 10 - HUNT 
EA Controls See ; 0.21% 0.24% 
Study 11 - 
GoDARTS CAD 
=f GE SC À 0% 0.17% 
Study 12 - EPIC 
CAD 
EA Contok WË a 0.14% 0.14% 
Study 13 — FIA3 
aA ae a 7 0% 0.38% 


Study 14 - German 
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CAD 


EA Cases 9681 37 
EA Controls 5769 41 0.38% 0.71% 
Study 15 - WTCCC 

EA Cases 2880 13 

EA Controls 5884 27 0.45% 0.46% 
Total Cases 33,889 113 

Total Controls 76,583 385 0.33% 0.50% 


EA denotes European ancestry; AA, African American; HA, Hispanic ancestry, WHI, Women’s Health Initiative; FHS, Framingham Heart 
Study; MDC-CVA, Malmo Diet and Cancer Study-Cardiovascular Arm; ARIC, Atherosclerosis Risk in Communities Study; IPM, Mt. Sinai 
Institute for Personalized Medicine Biobank; ATVB, Italian Atherosclerosis, Thrombosis, and Vascular Biology Study; Verona, Verona Heart 
Study; Ottawa, Ottawa Heart Study; PROCARDIS, Precocious Coronary Artery Disease Study; HUNT, Nord-Trøndelag health study; 
GoDARTS, Genetics of Diabetes Audit and Research Tayside; EPIC, European Prospective Study into Cancer and Nutrition; FIA3, 
ForstagangsInsjuknande i hjärtinfarkt i AC-laén; WT'CCC, Wellcome Trust Case Control Consortium 
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Table S12. Number of individuals expected to be homozygous or compound heterozygous for any of 
four APOC3 loss-of-function mutations 


N genotyped Combined allele Expected number of Variance in number of ` Standard deviation = 
frequency = q homozygotes = q”*n homozygotes = square root of 
q?(1-q?)*n variance 
110,970 1:300 1.23 1.23 1.11 
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Table S13. Association of APOC3 LoF mutations with CT hepatic fat in 3,051 Framingham 
Heart Study participants 


Outcome variable 


CT hepatic fat 


CT hepatic fat 


Predictor 
variable 

APOC3 R19X or 
IVS2+1 G>A 
(n=27) 

APOC3 R19X or 
IVS2+1 

(n=27) 


Covariates 


age, age’, gender 


age, age’, gender, 
# of alcoholic 
drinks per week 
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Beta (SE) P 
-0.04 (0.19) 0.82 
-0.04 (0.19) 0.84 


Table S14. Correlation of plasma apolipoprotein C-III level with plasma lipids, apolipoproteins, and 
cardiovascular risk factors in the Framingham Heart Study Offspring Cohort 


Variable Correlation Pr(>|t}) 
Cofficient 

Total cholesterol 0.473 2.26E-180 
High density lipoprotein cholesterol -0.135 1.25E-14 
Low-density lipoprotein cholesterol 0.233 5.01E-40 
Triglycerides 0.752 <1E-222 
Log (Triglycerides) 0.789 <1E-222 
Body mass index 0.200 1.42E-30 
Fasting glucose 0.271 2.31E-55 
Intermediate-density lipoprotein determined by NMR, Exam 4 0.274 4.92E-48 
VLDL size determined by NMR, Exam 4 0.390 6.15E-100 
LDL size determined by NMR, Exam 4 -0.327 3.20E-69 
HDL size determined by NMR, Exam 4 -0.178 6.77E-21 
Large VLDL particles determined by NMR, Exam 4 0.445 1.34E-132 
Medium VLDL particles determined by NMR, Exam 4 0.557 1.21E-222 
Small VLDL particles determined by NMR, Exam 4 0.135 1.41E-12 
Large LDL particles determined by NMR, Exam 4 -0.114 2.47E-09 
Medium LDL particles determined by NMR, Exam 4 0.249 9.21E-40 
Small LDL particles determined by NMR, Exam 4 0.329 1.02E-69 
large HDL particles determined by NMR, Exam 4 -0.135 1.60E-12 
medium HDL particles determined by NMR, Exam 4 0.209 3.14E-28 
small HDL particles determined by NMR, Exam 4 0.094 9.33E-07 
Apolipoprotein AI concentration by ELISA (mg/dl), Exam 4 0.132 6.79E-14 
Apolipoprotein AII concentration by ELISA (mg/dl), Exam 4 0.294 4.02E-65 
Apolipoprotein B concentration by ELISA (mg/dl), Exam 4 0.359 4.26E-98 
Cholesterol in remnant like particles in mg/dl, Exam 4 0.421 2.12E-106 
Triglycerides in remnant like particles in mg/dl, Exam 4 0.365 5.35E-76 
Systolic blood pressure 0.249 7.63E-47 
Diastolic blood pressure 0.172 7.17E-23 
Log (C-reactive protein), exam 5 0.174 9.53E-30 
Log (C-reactive protein), exam 6 0.193 1.90E-24 
Sex 0.019 0.276 
Age 0.200 2.05E-30 


Correlations are unadjusted. All measurements are made in exam cycle 5 unless specified. 
VLDL denotes very-low density lipoprotein; NMR, nuclear magnetic resonance; LDL, low-density lipoprotein; HDL, high-density lipoprotein 
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Table S15. Association of continuous plasma apolipoprotein C-III levels with 
incident CHD in the Framingham Heart Study 


Model Beta SE OR 95% Cllower 95% CI upper P 
1 0.044 0.011 1.045 1.023 1.068 4.90E-05 
2 0.017 0.015 1.017 0.988 1.047 0.26 


Model 1 covariates include age and sex 
Model 2 covariates include age, sex, smoking, diabetes mellitus, LDL cholesterol, HDL cholesterol, hypertension treatment, 
alcohol consumption, systolic and diastolic blood pressure, lipid-lowering treatment, and fasting serum glucose 


55 


Table S16. Association of tertiles of plasma apolipoprotein C-III levels with incident events in the 


Framingham Heart Study Offspring cohort 


Model Comparison 


Lowest third vs. 


Highest third 


Middle third vs. 


Highest third 


Lowest third vs. 


2 Highest third 


Middle third vs. 


Highest third 
Model 1: age and sex 


Beta 


-0.435 


-0.174 


-0.117 


0.029 


SE 


0.148 


0.133 


0.159 


0.141 


OR 


0.890 


1.029 


95% CI 95% CI 


P 
lower upper 
0.484 0.865 0.003 
0.647 1.091 0.19 
0.651 1.214 0.46 
0.780 1.357 0.84 


Model 2: age, sex, smoking, diabetes mellitus, LDL cholesterol, HDL cholesterol, hypertension treatment, alcohol consumption, systolic and 
diastolic blood pressure, lipid-lowering treatment, and fasting serum glucose 
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Table S17. Association of continuous plasma apolipoproteinC-III levels with total 
mortality in CAD patients in Verona Heart Study 


Model Beta SE OR 95% CI lower 95% CI upper P 
1 0.078 0.016 1.081 1.047 1.116 2E-06 
2 0.107 0.025 1.113 1.059 1.168 2E-05 


Model 1 covariates include age and sex 
Model 2 covariates include age, sex, diabetes mellitus, hypertension, LDL cholesterol, HDL cholesterol, lipid- 
lowering treatment, and fasting serum glucose 
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Table S18. Association of tertiles of plasma apolipoproteinC-III levels with total mortality in CAD 
patients in Verona Heart Study 


95% CI 95% CI 


Model Comparison Beta SE OR P 
lower upper 
Lowest third vs. 
: Highest third -0.677 0.211 0.508 0.336 0.769 0.001 
Middle third vs. 
Highest third -0.565 0.215 0.569 0.373 0.867 0.009 
Lowest third vs. -0.774 0.317 0.461 0.248 0.858 0.015 
2 Highest third 
Middle third vs. 
Highest third -0.819 0.301 0.441 0.244 0.795 0.006 


Model 1: age and sex 


Model 2 covariates include age, sex, diabetes mellitus, hypertension, LDL cholesterol, HDL cholesterol, lipid-lowering treatment, and fasting 
serum glucose 
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Table S19. Association of continuous plasma apolipoproteinC-III levels with 
cardiovascular mortality in CAD patients in Verona Heart Study 


Model Beta SE OR 95% Cllower 95% CI upper P 
1 0.069 0.020 1.071 1.029 1.115 0.001 
2 0.088 0.033 1.092 1.023 1.165 0.008 


Model 1 covariates include age and sex 
Model 2 covariates include age, sex, diabetes mellitus, hypertension, LDL cholesterol, HDL cholesterol, lipid-lowering treatment, 
and fasting serum glucose 
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Table S20. Association of tertiles of plasma apolipoproteinC-III levels with cardiovascular mortality in 
CAD patients in Verona Heart Study 
95% CI 95% CI 


Model Comparison Beta SE OR P 
lower upper 
Lowest third vs. 
, Highest third -0.673 0.268 0.510 0.302 0.862 0.012 
Middle third vs. 
Highest third -0.246 0.248 0.782 0.481 1.271 0.321 
Lowest third vs. -0.850 0.399 0.427 0.195 0.934 0.033 
2 Highest third 
Middle third vs. 
Highest third -0.523 0.341 0.593 0.304 1.156 0.125 


Model 1: age and sex 


Model 2 covariates include age, sex, diabetes mellitus, hypertension, LDL cholesterol, HDL cholesterol, lipid-lowering treatment, and fasting 
serum glucose 
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